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Abstract 

A lattice based model of a protein is used to study the dimerization equi- 
librium of the GCN4 leucine zipper. Replica exchange Monte Carlo is used 
to determine the free energy of both the monomeric and dimeric forms as a 
function of temperature. The method of coincidences is then introduced to 
explicitly calculate the entropy loss associated with dimerization, and from 
it the free energy difference between monomer and dimer, as well as the cor- 
responding equilibrium reaction constant. We find that the entropy loss of 
dimerization is a strong function of energy (or temperature), and that it is 
much larger than previously estimated, especially for high energy states. The 
results confirm that it is possible to study the dimerization equilibrium of 
GCN4 at physiological concentrations within the reduced representation of 
the protein employed. 
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I. INTRODUCTION 



Reduced models of a protein have been shown to provide a possible route for the esti- 
mation of the free energy of dimerization of relatively short coiled-coils . A key step in 
the calculation of the free energy of dimerization concerns the entropy loss upon bringing 
two monomer chains together to form the dimer. A practical method for the numerical 
estimation of this entropy loss by Monte Carlo simulation is discussed in this paper. 

The focus of our work is on the calculation of free energies of dimerization, and in 
particular a re-analysis (using a subsequently improved model) of prior research about the 
folding thermodynamics of the GCN4 leucine zipper [@,[|. Leucine zippers belong to the 
class of structural motifs that are known as coiled coils. Generically, they comprise right 
handed a helices wrapped around each other with a small left-handed super-helical twist 
||. While leucine zippers can exist in both monomeric or dimeric form ||, GCN4 forms a 
dimer in the crystalline phase ||. 

Numerical calculations of the free energy difference between the monomeric and dimeric 
forms of GCN4 have already been given in [@,|J. In ref. 0, the free energy of the monomer 
was computed by transfer matrix methods, whereas the entropy change of dimerization was 
estimated by Monte Carlo methods. Two monomers were placed in a parallel configuration, 
and the configurational partition function was estimated by placing the two monomers in 
registry with one another, but considering different position of the relative starting point 
of each segment. The Entropy Sampling Monte Carlo method of Hao and Scheraga || 
was used in ref. to sample the configurational space of both monomer and dimer. By 
using the values of the various terms that contribute to the entropy loss of dimerization 
given in |2j , the equilibrium dimer fraction was computed as a function of temperature and 
monomer concentration. As a byproduct of the calculation, other equilibrium quantities 
such as the helical content of the monomer and dimer were obtained, as well as an analysis 
of the existence of possible folding intermediaries. 

In our present work, we use the Replica Exchange Monte Carlo method |7|-f| to obtain the 
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canonical distribution of both monomer and dimer forms separately. Re-weighting methods 
are then used to calculate the respective free energies as a function of temperature pH|,|TT 



In the final step, a method is described to place both free energies on the same relative scale 
so that the free energy difference between the monomer and dimer forms can be computed. 
We believe that the method described in this paper allows a more accurate determination 
of the entropy loss of dimerization than previously reported, and therefore it allows a more 
accurate determination of the free energy difference between the monomeric and dimeric 
forms as a function of temperature and concentration. 

II. THE PROTEIN MODEL 

Our analysis is based on a a reduced model in which a protein is represented by a sequence 
of virtual bonds connecting effective particles |i"2] , |T3] |. Each of these particles is assumed to 



be located at the center of mass of the side chain and backbone a carbon. The effective 
particles are embedded in a regular cubic lattice of fixed spacing that allows for a fairly 
accurate representation of the backbone of known protein structures. This geometric part 



of the model has been checked against all structures contained in the protein data bank [14 



The observed root mean squared deviation between the lattice representation of any protein 



and its resolved structure is typically below 0.8 A [[IIJ. The actual resolution of the model 
is of course lower, typically of the order of 2 Afor small proteins. 

Effective interactions (force fields) are introduced among the particles that include 
generic (sequence independent), and sequence specific contributions. The potentials as- 
sociated with the generic type of interactions are defined so as to introduce a bias toward 
reasonable secondary structures. One such potential is introduced to account for the fact 
that proteins exhibit a characteristic bimodal distribution of neighbor residue distances, 
specially between the i-th and (i+4)-th residues. Configurations corresponding to the larger 
distance in the distribution are associated with proteins that exhibit either /5-type or ex- 



panded coils, whereas the shorter distance corresponds to helices and turns ||15||. A second 
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generic interaction further introduces a bias toward certain packing structures such as he- 
lices and /3-type states. The first potential produces the required stiffness of the polypeptide 
chain, whereas the second provides for local cooperative motion during packing. 

Sequence specific interactions are of three types, and include short range interactions, 
long range, pairwise interactions, and many body interactions. The short range pairwise 
potentials are of statistical origin and are fitted to nonhomologous reference structures. 
This is accomplished by considering the known distances between pairs of aminoacids that 
are separated by one through four bonds along the chain. Chirality is also introduced by 
considering an interaction between the i-th and (i+3)-th and (i+6)-th bonds to produce 
the correct pitch. Long range interactions are also of statistical origin, and are assumed to 
depend not only on relative distances between the effective atoms, but also on the relative 
orientation between the corresponding bonds (for example, residues of opposite charges are 
attractive when the corresponding bonds are parallel to each other, whereas the interaction 
is weak or repulsive when they are anti-parallel). 

Finally, although multi-body interactions are implicitly included (as an unknown contri- 
bution) in the pairwise potentials determined from inter-residue distances and bond angles, 
two additional terms are added to model hydrophobic interactions and the known probability 
of a residue to have a given number of parallel and anti-parallel contacts. The hydrophobic 
potential is estimated from the surface exposure of a given side chain, i.e., of all possible 
contacts of a side chain, those that are not effectively occupied by contacts with neighbor- 
ing chains. The second multi-body potential introduces a bias toward known propensities 
of various aminoacids to pack their side chains in parallel or anti-parallel orientation. Re- 
cent applications of the methodology include the improvement of threading based structure 
prediction [[0|, and direct ab-initio folding studies jl6H . 
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III. MONOMER-DIMER EQUILIBRIUM AND MONTE CARLO METHOD 



At constant temperature and in thermodynamic equilibrium, the concentration of freely 
associating monomers and dissociating dimers is governed by the reaction constant, K = 
[D]/[M] 2 , where [M] and [D] are the concentrations of monomers and dimers respectively 
The mol fraction equilibrium constant can be expressed in terms of the canonical partition 
functions of both monomer and dimer as [ 17fl , 



Z D AT Q 



K x = c K = N^- = N-^, (1) 

where N is the total number of chains (i.e., two chains per dimer), cq = N/V the chain 
concentration in a system of volume V, and Zm and Zjj the canonical partition functions 
of the monomer and dimer. The momenta degrees of freedom of both monomer and dimer 
can be integrated out from their partition functions, and the respective kinetic contributions 
cancel. We have therefore introduced the configurational partition functions Qm and Qd, 
as well as the symmetry factor that takes into account the indistinguishability of the two 
chains that form the dimer (<td = 2! in the present case) fll7| . The purpose of the present 
paper is an estimation of Qm and by the Monte Carlo method. 

Although it is in principle possible in to estimate the ratio of configurational partition 
functions in Eq. ([J) by direct simulation involving coexisting monomers and dimers that 
transform into each other by association or dissociation, we have found that this is not 
practical. First, proteins are large molecules (even in the reduced representation used in our 
work and described in Appendix |A|), and only a small number of them can be placed in a 
computational cell. Second, the protein models employed lack very long range interactions, 
and hence there are large entropic contributions to the free energies which are difficult 
to sample accurately over the large configurational space of a protein. We instead follow 
the approach of refs. [^]|| which consists of two steps: an independent computation of the 
free energies of the monomer and dimer forms, followed by a transformation to a common 
reference state so that the free energy difference between the two can be estimated. 
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The configurational partition functions Qm and Qd are estimated by the Replica Ex- 
change Monte Carlo method J7|. Either a single monomer or a single dimer are placed 
in the computational cell, and a set of canonical Monte Carlo simulations are performed 
at set of prescribed neighboring temperatures. By conducting simulations involving only 
one monomer or one dimer we are explicitly assuming that at physiological concentrations 
monomer-dimer or dimer-dimer interactions can be neglected. The simulation to obtain Qd 
involves two identical monomers constrained during the course of the simulation to states 
in which there is at least one inter chain contact. 

We consider a set of r independent canonical simulations conducted in parallel at a set 
of neighboring inverse temperatures fa, (i — 1, . . . , r). At fixed intervals during the course of 
the simulation, two configurations at different temperatures ( "replicas" ) are chosen at ran- 
dom, and their respective temperatures exchanged with probability defined so as to preserve 
detailed balance as given by the canonical probability distribution. Additional details about 
the so-called Replica Exchange Monte Carlo method can be found in ref . |7j . At each inverse 
temperature fa, a sample of rii statistically independent configurations is collected, and the 
corresponding energy histograms hi(E) calculated with some arbitrary energy binning AE. 



Re- weighting is then used to estimate the partition function by simultaneously solving [10 



e^ = AEj2p(E,fa), (3) 

E 

where = faF(fa) is the dimensionless thermodynamic free energy at inverse temperature 
fa. As is standard, free energy and canonical partition function are related through f\ = 
— In Q(fa). Also, for latter reference, we note that the configurational density of states W(E) 
is given by, 

W(E) =p(E,!3)e pE , 
and therefore the configurational entropy can be obtained as, 

—f—}- = In (W(E)AE) 
6 



where ks is Boltzmann's constant. Two sets of simulations are performed to yield monomer 
{/m,;} and dimer {fo,i} free energies at the set of inverse temperatures {Pi}. Note that each 
set is known up to an arbitrary additive constant. 

In the remainder of this Section, we describe a number of transformations of the con- 
figurational partition functions Qm arid Qd either for computational convenience or for the 
calculation of the proper reference state. Given the assumed integration of particle mo- 
menta, both Qm and Qd are expressed in terms of individual particle coordinates, and 
have dimensions of V M and V 2M respectively, where M is the number of aminoacids in the 
monomer. Elimination of rigid translation or rotation degrees of freedom from the configu- 
rational partition functions, for example, is usually accomplished by introducing the center 
of mass and principal axes of inertia of the molecule, and then relative coordinates for the in- 
dividual particles. This requires either integrals over the corresponding canonical momenta, 
or the explicit consideration in the coordinate integrals of the appropriate transformation 
Jacobians. Since the Jacobians are configuration dependent, we follow instead earlier work 
and conduct all our transformations on single particle coordinates alone thus obviat- 
ing the need to introduce complicated Jacobian functions. For example, the elimination of 
the degrees of freedom associated with uniform translations is accomplished by eliminating 
the motion of particle 1 in one of the chains. The elimination of rigid rotation is partially 
accomplished by disallowing the rotation of the bond between particles 1 and 2 of one of 
the chains. Finally, in our estimate of entropy losses on dimerization (Section A| ) , we 
exclusively use phase space volumes of single particle coordinates to maintain the necessary 
dimensional consistency of Eq. ([!]) after all the transformations of both Qm and Qd that 
are described in this Section. 

The configurational partition functions Qm and Qd are independent of the location of 
the molecule and of its orientation. The statistical accuracy of the simulation is greatly 
increased if those degrees of freedom that correspond to rigid translations and rotations 
are eliminated. The translational degree of freedom is eliminated by fixing the location of 
the first particle in the monomer chain, or of chain one in the dimer. Since the partition 
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function is independent of this particle's location, this coordinate can be integrated out to 
yield a factor of V to the configurational partition function. The state of rigid rotation of 
the molecule can be specified by three angles, or the orientation of one axis plus a rotation 
around this axis. The orientation of the axis is defined by its azimuth a G (— 7r, 7r) and 
a polar angle 9 G (0, 7r). The rotation around this axis is given by a third angle 7 G 
(— 7r,7r). Therefore the corresponding element of volume in configuration space is given by 
the triple integral / da dcos((3) = 8ir 2 . This value can also be exactly factored out 
from the configurational partition function. However, given that the Monte Carlo method 
used employs multiple bond transitions, we have found it convenient to proceed somewhat 
differently. We disallow the Monte Carlo transition that corresponds to a two-bond change 
at the N-terminus, and therefore effectively eliminate the motion of particles 1 and 2 of the 
chain. While fixing the location of particle 1 still allows an exact calculation of the partition 
function, eliminating the motion of particle 2 introduces two approximations. The first one 
involves the factorization of the configuration space volume of particle 2. Since the motion 
of this particle is not independent of the motion of the rest of the chain, this factorization 
is only approximate. The degrees of freedom that are eliminated include a rotation around 
an axis defined by the bond vector between particles one and two, plus fluctuations in 
bond vector length. The elimination of the rotation is exact since the partition function is 
independent of the orientation of this axis, and yields a factor of / dadcos((3) = Att to the 
overall partition function. Factorization of the configuration space volume associated with 
bond length fluctuations is only approximate. We write, 

Q ~ VV^Q' ~ vf (Ri ax - R 3 min ) Q', (4) 

where Q' is the partition function that is actually computed during the Monte Carlo simu- 
lation, and is the constant accessible volume for particle 2. The second approximation 
made involves the assumption that the second atom is free to move within a spherical shell 
centered in the first atom, of inner radius R m in = 4.35 A and outer radius R max = 7.94 A. 
These two values are the smallest and largest bond distances allowed in the lattice model 

8 



used. Finally, note that the computed partition function Q' still contains a factor of 2ir 
corresponding to the angle 7, the unrestricted rigid rotation of the molecule around the axis 
defined by the bond vector between particles one and two that is not eliminated during the 
simulation. 

A. Reference state calculation 

In order to introduce a common scale for the monomer and dimer free energy sets 
and {/d,j}, we follow the method of ref. 0. At sufficiently high energies, one may assume 
that inter-chain interactions are negligible, and that the internal motions within each dimer 
chain are well approximated by those of the monomer. Therefore, and in this limit, the 
entropy of the dimer is approximately twice that of the monomer. This fact allows one to 
place the entropies from both monomer and dimer simulations in the same reference state 
at high energy, and hence to compute free energy differences between the two. 

We briefly summarize here the steps taken for both monomer and dimer. The calculation 
of the internal entropy of the monomer is straightforward. In the dimer case, however, the 
contributions from the internal modes have to be separated from other degrees of freedom 
related to the relative position and orientation of the two chains. Since in the dimer simu- 
lation the two chains are constrained to have at least one contact, the computed entropies 
of the dimer at high energies still contain entropy losses due to this constraint that have to 
be estimated and subtracted to compute its internal entropy. 

1. Monomer 

The monomer partition function Q' M computed by the Monte Carlo method still contains 
the contribution of one degree of freedom that is not associated with the internal motions of 
the particles, and that corresponds to a rigid rotation of the molecule around the axis defined 
by the bond between particles 1 and 2. From the Monte Carlo simulation and re-weighting, 
we obtain the probability density p'(E,(3) that corresponds to the partition function Q' M in 
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Eq. (f|). The corresponding entropy is, 

S'm(E) 



In (p'(E,/3)AE)+/3E, 



kB 

a quantity that is independent of the inverse temperature f3. The internal entropy follows 
by subtracting the entropy of rotation of the azimuth angle of bond 2-3, and by adding the 
estimate given above for the radial part of the first bond, 

S M ,int( E ) = ln (3)AE) + (3E- 1h(2tt) + In ^ max ~ . (5) 



A- 



B 



2. Dimer 

The dimer simulation is conducted by fixing the positions of atoms 1 and 2 of one of the 
chains. Therefore, a Monte Carlo estimate is obtained for Q' D as given in Eq. ([|), As was 
the case for the monomer, we first define the entropy as estimated from the simulation by 

^^- = \n(p'(E,(3)AE)+PE. 
k B 

where p'(E,/3) corresponds to Q' D above. At sufficiently high energies, where the internal 
degrees of freedom of each chain are expected to become independent of the relative position 
and orientation of both chains, the total conformational density of states factors into a 
product involving the various contributions. In terms of the entropy, this factorization leads 
to the decomposition, 

Sd ^ E ^ = l n (p>(E, (3)AE) +(3E- 1h(2tt) + In ^ max ~ R ^ - ln vf } (E) 
k B 3 

-ln{ Va {E) Vcos p{E)^{E)). (6) 

The quantity Vi(E) is the accessible volume of particle 1 of chain 2 at energy E, and hence 
yields the accessible volume loss of dimerization. Its estimate during the Monte Carlo run is 
one of the main topics of this paper. Since the motion of the second chain relative to the first 
is constrained so that the number of inter-chain contacts is greater than zero, this volume will 
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be in general much less than V. The quantity <p a {E) is the configuration volume available 
for the azimuth of bond 1 of chain 2, y2 cos /3 for the cosine of the polar angle of bond 1 of 
chain 2, and </2 7 for the azimuth of bond 2 of chain 2. The product of the three represents the 
loss of rotational configuration space volume due to the formation of a dimer. If the second 
chain were to rotate freely relative to the first, we would have y9 Q (i?)y? COSj g(-E)</3 7 (-E) = 87r 2 . 
The value found is less that this upper bound, but it approaches 8tt 2 as the energy of the 
dimer is increased. As was the case with V^^E), these three quantities also need to be 
estimated during the simulation. 



3. Configuration space volume estimation 

(2) 

The configuration space volumes Vi (E),{p a (E),{p cos/ 3(E) and <£> 7 (-E) have been esti- 
mated by using the method of coincidences |18|| . Let V be the volume of a certain region 
of configuration space. Consider a finite sample of configurations that are uniformly dis- 
tributed in T, and let To T be a small coarse-graining volume in configuration space. The 
method involves computing the coincidence rate R that a pair of configurations in the sample 
belongs to the same coarse-graining volume. If the configurations are uniformly distributed 
in T, the probability of a coincidence is R = Tq/T. Therefore an estimate of R = n c /n t , 
where n t is the total number of pairs in the sample, and n c the total number of coincidences 
given To, allows an estimation of T. 

In order to satisfy the conditions of the method, we first group all configurations (re- 
gardless of their temperature) according to their energy. Since all the configurations with 
the same energy are expected to occur with equal probability we calculate the coincidence 
rate R{E) to estimate T(E) for the various magnitudes of interest (vf , ip a , <^ cos /3, and <pj). 

We next note some limitations in the accuracy of the method. If To is not much smaller 
than T, error is introduced as To will not generate a good covering set of T, and it is likely 
that the method will overestimate the size of the region T. On the other hand, if T is too 
small, the number of coincidences will be small, and the statistical error in the determination 
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of R is large. There is a third source of error associated with the sample size at each energy 
or, equivalently, the total number of pairs n t (E) [T^]. The number of coincidences can be 
estimated as, 

with To such that all nt(E) configurations have been distributed among k groups. Therefore, 

1 fn t (E)\ 2 T 



2 \ k J n c (E) 

so that for a fixed minimum n c (E) to insure adequate statistics of the coincidence rate, the 
estimated value of V is bounded by n\{E). Therefore sufficiently large samples are needed 
at each energy if the corresponding value of V is large. We will further illustrate these 
limitations in Section |TV| . 

4- Reference entropy difference 

In order to place both the monomer and dimer in the same reference state, we require 
that in the limit of high E, 

SD,int{E) + S = 2S M ,int(E/2), (7) 

where S is a constant, independent of E, Sr>^ nt (E) is given by Eq. (Q) and SM,int(E) by 
Eq. fl5|). The quantity So is determined numerically as shown in Section |TV|. 

Once the constant Sq has been determined, the free energy and partition function of the 
dimer are re-scaled according to 

fi=fi-S 0i Q" D = Q' D e s \ (8) 

We can now compute the equilibrium constant K x by substituting Eq. (H) for both 
monomer and dimer into Eq. ([j]), but using the rescaled dimer partition function Q" D 
defined in Eq. (||) instead of Q' D , 

X VV^oM VV^P vdQ'm [ ) 
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With this re-definition of the dimer partition function, both Q" D and Q' M are referred to the 
same reference state, and hence absolute values of K x can be given. 

In terms of the free energies of the monomer and dimer that are obtained from the Monte 
Carlo calculation after re-weig hting (Eqs. (g) and (D), Eq. © leads to, 

hi KM) = lnc + S - f' D>i - \na D - InV^ + 2f M>i . (10) 

If lnK x > the dimer is prevalent. 



IV. RESULTS 

The method described in Section [III] has been tested on the GCN4 leucine zipper (a 
31 residue segment with the characteristic heptad repeat sequence of leucine zippers). The 
oligomerization equilibrium of the wild type has been addressed both experimentally and 
computationally ||, as well as that of several of its mutant forms JiTUip. Due to the short 
length of the sequence, and the simplicity of its secondary structure, numerous computational 
studies have addressed various aspects of the oligomerization process in GCN4, including 
dimer and multi-mer equilibria JEJ, the stability of several of its sub-domains ||, oligomeric 
equilibrium of several of its mutant forms [pD| , and other parameters of the coiled coil such 



as the helical content as a function of temperature and a van't Hoff enthalpy analysis to 
reveal the adequacy of a two state assumption for the dimerization process 0. 

We have extended the analysis of || in two directions. First we use a Replica Exchange 
Monte Carlo method instead of the Entropy Sampling Monte Carlo method of that reference 
as the former provides a faster rate of convergence to the equilibrium distribution of the 
dimer form. Second, we extend the method of calculation of the various entropy losses 
upon dimerization, and show their strong dependence on the energy of the configuration, a 
dependence that was not taken into account in previous studies. 

The results shown are based on two long runs for the monomer and dimer forms re- 
spectively. Initial configurations were chosen close to the native state, but first equilibrated 
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at constant temperature. Several runs with different initial conditions yielded essentially 
identical results for the various thermodynamic quantities presented, although none of the 
dimer simulations involved an initial condition in a manifestly anti-parallel configuration. 
The monomer runs involved 2 x 10 6 independent configurations or steps after equilibration, 
with one replica exchange attempted every 500 steps. Quantities for analysis were collected 
every 250 steps. The dimer simulation comprises two identical, and initially parallel chains 



with at least one contact between them . The simulation in this case is conducted by re- 
jecting all bond moves that would result in no contacts between the chains. The run for the 
dimer involved 1.3 x 10 6 configurations, with the same frequency of analysis and of replica 
exchange. In both cases, twenty independent replicas were run in parallel at dimensionless 
temperatures in the range T = 0.5 — 1.45 in increments of 0.05. In the low temperature 
range, the root mean squared deviation (RMSD) between the estimated location of the a 
carbons in the model and the native configuration is of the order of 3 A(see Fig. [I]). We 
note that this RMSD range constitutes a prediction, and was not enforced during the course 
of the simulation. 

The analysis presented is based on energy histograms, and the subsequent re-weighting 
described in Section |TTT| . The entire range of energies sampled by the monomer and the dimer 
during the course of the simulation was divided into 100 equal bins, so that in dimensionless 
units AE ~ 0.2926 for the monomer and AE ~ 0.3767 for the dimer. 

In the case of the dimer, the location of all individual particles was also recorded every 

(2) 

250 steps in order to estimate the configuration space volumes , <£> a , an d (p T The 

(2) 

results presented for are based on the spatial coordinates of particle 15 of chain 2 (the 
chain that is free to move within the computational cell). Substantially identical results 
follows from an analysis of any other particles in the chain, except for those in the immediate 
vicinity of the N- or C- termini. 

The configuration space volume ip a is estimated from the azimuth of the bond between 
particles 15 and 16 of chain 2, and y? C os/3 follow from cos/3, (3 being the polar angle of this 
bond. Finally, <^ 7 is obtained from the azimuth distribution of the bond between particles 15 
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and 16. In a freely rotating molecule, a is uniformly distributed in (— 7T, 7r), cos/? in (—1, 1), 
and 7 in (— n, it), resulting in a combined conformational space volume for rigid rotation of 

Figure shows our results for v} with the same energy bin size AE used to construct 
the histogram. The coarse-graining volume T = AxAyAz has been obtained by defining 
Ax = {x max — x m in)/6, and similarly for Ay and Az. x m i n and x max are the smallest and 
largest values of Xi 5 , the x coordinate of particle 15, for each particular energy bin. We 
present our results for a range of values of 5 in Fig. |2]. If 5 is too large, the coarse-graining 
volume is small, and the number of configurations for a given energy rit(E) is also small. 
As discussed in Section |T1|, this leads to underestimate the accessible volume. The value of 
Vi is seen to increase with decreasing 5, becomes approximately independent of 5 in some 
range, and then further increases with decreasing 5. If 5 is too small, the shape of the region 
being sampled cannot be accurately reproduced with this coarse r . Note that the values 

(2) 

of Vi at low energies are the most difficult to estimate, presumably because the shape of 
the region in configuration space is not as smooth as that at higher energies. However since 
the procedure leading to the computation of the reference entropy relies only on the region 
of high energies, this inaccuracy does not represent a significant limitation to our results. 

The behavior just described is qualitatively similar to that shown in Fig. |3| corresponding 
to the rotation volume <^a ( / 3 cos/3V 9 7- m this case we define Tq = AaA(cos (3) A7 with the same 
definition of Aa, A(cos/3) and A7 in terms of the quantity 5. The figure also shows (solid 
line) the value 8n 2 that corresponds to free rotation of chain 2 relative to chain 1. As can 
be seen from the figure, the values obtained approach this limit at high energies. 

The constant So of Eq. (|7|) required to place both the monomer and dimer free energies 
in the same scale is obtained directly from Eqs. ([|) and as shown in Fig. f|. In this 
figure we plot SD,int{E) + So and 2SM,int(E/2) with So adjusted graphically so that the two 
curves coincide at large E. Note that both curves superimpose to a good accuracy for a 
range of energies, indicating the consistency of the approach. 

We show next our results for K x in Fig. |5], with K x defined in Eq. (||), as a function of 
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the dimensionless temperature. When K x = 1 the mol fractions of the monomer and dimer 
forms are equal {xm — %d — 0.5). At low temperatures K x > 1, indicating a prevalence of 
the dimer form, and the reverse is true at high temperatures. For the sake of illustration, the 
figure shows the values of K x at two different concentrations Co = 10 [iM and Co = 1 mM. 

It is also intersting to examine the contact map of the dimer phase as given by the 
simulation. As discussed above, the only constraint in the simulation is that there be at 
least one contact between the two chains. Therefore the question arises as to whether 
the dimer retains a significant fraction of native contacts in the vicinity of the transition 
temperature, or whether there is a significant fraction of out of register dimer configurations 
that are structurally very different from the native state. In order to answer this question, 
we proceed as follows: A contact between residues belonging to different chains is considered 
native if it appears in the contact map of the native protein in its lattice representation. 
With this definition, GCN4 has 10 inter chain native contacts. We then calculate the 
ensemble average of the fraction of configurations that have at least 50 % native contacts. 
The results are shown in Fig. || as a function of temperature. This fraction approaches one 
at low temperature, changes quickly around the transition region, decaying to zero at high 
tempertures. We also shown in this figure the equilibrium mol fraction of the dimer form 
for the same two concentrations shown in Fig. [|. From the figure, we conclude that in this 
range of physiological concentrations, the decrease in the fraction of native contacts as given 
by our model can be mainly attributed to the appearance of the monomer form, and not to 
a significant contribution from out of register dimers. 

To conclude, our results show that it is possible to calculate the entropy loss of dimer- 
ization corresponding to the GCN4 monomer-dimer equilibrium without any restrictions to 
the motion of the individual chains. Previous research on this system differed from ours in 
that the entropy loss was estimated by restricting the conformation space of the dimer, thus 
resulting in low values of the entropy (compare the value of V/ = 67.6A 3 given in Table 1 
of ref. 0, and the values shown in Fig. |||). Despite the fact that we have allowed sampling 
of the full conformational space of the dimer, the results obtained confirm that it is possible 
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to obtain the free energy of dimerization of GCN4 at physiological concentrations by using 
a reduced model of the protein and Monte Carlo simulations. 
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APPENDIX A: LATTICE PROTEIN MODEL AND INTERACTION FORCE 

PARAMETERS 



The model protein used in this work employs a reduced representation of the protein 
backbone on a regular lattice. The model comprises a sequence of bonds connecting particles 
located at the center of mass of the corresponding residue and backbone a carbon. The 
particles are then placed in a three dimensional simple cubic lattice with spacing of 1.45 A. 
Further details on this model can be found in ref. |15[ 

A sequence of configurations is generated by a Monte Carlo scheme with Metropolis 
updating. The method employs three different types of individual transitions. In the first 
case, a single particle and its two corresponding bonds are selected for an attempted update. 
The second type of transition involves three consecutive bonds and the corresponding two 
adjacent particles. The third involves a rigid translation of a small fragment of the chain 
comprising three particles, and the ensuing rearrangement of the end bonds. These tran- 
sitions are attempted sequentially for all the bonds in the chain. Two separate transitions 
are also included to adjust the position of the N- and C-termini particles. The set of all 
these attempted transitions constitutes a Monte Carlo step (MCS). Further details about 
the transitions used in the Monte Carlo updating can be found in ref. ||22|| . 

Interaction forces can be grouped into generic and sequence specific. The former are 
sequence independent and lead to protein-like packing, whereas the latter are derived from 
a statistical analysis of the protein database, and explicitly depend on the identity of the 
aminoacids involved. We next list the values of the various parameters used in our calcula- 
tions. Sequence dependent short range interactions are defined by Eq. (1) of p2| . We use 
a common multiplicative factor in our calculations e s h or t — 0.325 (this factor is explicitly 
shown in Eq. (12) of |15[ with a value of 0.75 instead). A three-body potential that is 
sequence specific is also used, with an amplitude e^b = 0.25. The generic, short range con- 
formational biases of Eqs. (3), (4), and (5) of [|T3J involve e gen = 1.25. Hydrogen bonding 
energies within the main chain are also included, with an amplitude En-bond = 0.325 in Eq. 
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(7) of flI5| . Long range and sequence dependent interactions are modeled by a set of square 
well potentials as described in Eq. (9) of [HJ. We have chosen E rep = 4 in Eq. (8) of that 
reference, and a common multiplicative factor e pair = 2.0 (to be compared with the factor 
of 1.25 in Eq. (12) of uM)- Two additional multibody potentials are introduced to include 
hydrophobic effects, and preferences for parallel or anti-parallel packing among the residues. 
We define as the scale of Eq. (10) in |L5] e sur f ace = 0.75 (instead of the value 0.5 shown in 



Eq. (12) of that reference). Finally, we have used a factor e mu iu = 0.75 in Eq. (11) of fl5 
(instead of the value 0.5 in Eq. (12)). 
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FIGURES 
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FIG. 1. Histogram of the sampled root mean squared deviation from native (RMSD in A 3 at 
the dimensionless temperature T = 0.6.) . 
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FIG. 2. Volume of configuration space (in A 3 ) that is accessible to rigid translation of 
chain 2 of the dimer as a function of its energy. Several different choices of the coarse-graining 
volume Tq are shown in the figure as indicated by the values of 5. 
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FIG. 3. Volume of configuration space accessible to rotation of chain 2 of the dimer (p a (p cos gip^ 
for four different choices of the coarse-graining volume, as indicated by the values of d in the figure. 
The straight line corresponds to the value associated with free rotation, 87r 2 . 
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FIG. 4. Rescaled internal entropy of the dimer So,int + Sq, with Sq = 52 (squares), and twice 
the internal entropy of the monomer 2SM,int{E/2) (circles). The value of Sq has been graphically 
determined in order to make the two curves coincide in the range of large E. As can be seen from 
the figure, the two curves superimpose quite accurately for a significant range of energies. 
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FIG. 5. Equilibrium reaction constant for monomer-dimer equilibrium as a function of dimen- 
sionless temperature and for the two concentrations indicated. 
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FIG. 6. Equilibrium mol fraction of the dimer as a function of temperature. 
xd = (l + 4J£ X — y/T+ 8K X ) /4:K X , for the two concentrations indicated. The value of the equi- 
librium constant K x is shown in Fig. ^. We also show the fraction of configurations that, at the 
given temperature, had at least 50% native contacts. 
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